4 years ago, the argument about the stop relying 100% on null hypothesis significance testing (NHST) which was the P-VALUE. A very appealing alternative to NHST is Bayesian statistics, which in itself contains many approaches to statistical inference. In this post, I provide an introductory and practical tutorial to Bayesian parameter estimation in the context of comparing two independent groups’ data based on the adaption of UC’s lecture and Kruschke’s textbook (Chapter 16).
One example of two-groups problem is testing effect of a drug when one group receives a placebo and another receives the drug. The response variable is result of IQ test.
In this example we estimate mean and standard deviation for two groups using normal distribution, then robust \(t\)-distribution with common normality parameter \(\lambda\).
<- read.csv("data/IQdrug.csv")
dta ::datatable(dta) DT
<- as.numeric(dta[,"Score"])
y <- as.numeric(as.factor(dta[,"Group"]))
x
<- levels(as.factor(dta[,"Group"]))) (xLevels
[1] "Placebo" "Smart Drug"
= length(y)
Ntotal
# Specify the data in a list for JAGS/Stan:
= list(
dataList y = y,
x = x,
Ntotal = Ntotal,
meanY = mean(y),
sdY = sd(y)
)
Stan
#source("../DBDA2Eprograms/DBDA2E-utilities.R")
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Assuming that both groups samples have normal distributions, estimate the parameters and compare them.
Write description of the model for stan.
\[ y_{i\mid j} \sim N(\mu_j, \sigma^2_j) \\ \mu_j \sim N(\bar{Y}, \frac{1}{100*SD_Y^2}) \\ \sigma_j \sim uniform(\frac{SD_Y}{1000}, SD_Y*1000) \]
# Use the description for Stan from file "ch16.2.stan"
= "
modelString data {
int<lower=1> Ntotal;
int x[Ntotal];
real y[Ntotal];
real meanY;
real sdY;
}
transformed data {
real unifLo;
real unifHi;
real normalSigma;
unifLo = sdY/1000;
unifHi = sdY*1000;
normalSigma = sdY*100;
}
parameters {
real mu[2];
real<lower=0> sigma[2];
}
model {
sigma ~ uniform(unifLo, unifHi);
mu ~ normal(meanY, normalSigma);
for ( i in 1:Ntotal ) {
y[i] ~ normal(mu[x[i]] , sigma[x[i]]);
}
}
"
If running the description in modelString
for the first
time create stanDSONormal
, otherwise reuse it.
<- stan_model(model_code=modelString) stanDsoNormal
If saved DSO is used load it, then run the chains.
#saveRDS(stanDsoNormal, file="data/DSONormal1.Rds")
<-readRDS("data/DSONormal1.Rds") stanDsoNormal
Run MCMC.
= c("mu","sigma") # The parameters to be monitored
parameters = 500 # Number of steps to "tune" the samplers
adaptSteps = 1000
burnInSteps = 4
nChains = 1
thinSteps = 5000
numSavedSteps <- sampling(stanDsoNormal,
stanFitNormal data = dataList,
pars = parameters, # optional
chains = nChains,
iter = (ceiling(numSavedSteps/nChains)*thinSteps+burnInSteps),
warmup = burnInSteps,
init = "random", # optional
thin = thinSteps)
Save the fit object for further analysis.
# save(stanFitNormal,file="data/StanNormalFit2Groups.Rdata")
load("data/StanNormalFit2Groups.Rdata")
Then, we can explore the results.
# text statistics:
print (stanFitNormal)
Inference for Stan model: a4cb1c74477f9a8b1e476a89a6556245.
4 chains, each with iter=2250; warmup=1000; thin=1;
post-warmup draws per chain=1250, total post-warmup draws=5000.
mean se_mean sd 2.5% 25% 50% 75% 97.5%
mu[1] 100.04 0.03 2.43 95.33 98.39 100.01 101.71 104.76
mu[2] 107.79 0.05 3.36 101.37 105.50 107.78 110.00 114.44
sigma[1] 18.34 0.03 1.81 15.28 17.07 18.22 19.44 22.38
sigma[2] 26.02 0.03 2.44 21.77 24.31 25.80 27.58 31.23
lp__ -423.27 0.03 1.47 -426.95 -424.00 -422.93 -422.20 -421.44
n_eff Rhat
mu[1] 5453 1
mu[2] 5035 1
sigma[1] 5004 1
sigma[2] 5772 1
lp__ 2598 1
Samples were drawn using NUTS(diag_e) at Mon Jan 30 00:57:34 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
# estimates & hdi:
plot(stanFitNormal)
# samples
::traceplot(stanFitNormal, ncol=1, inc_warmup=F) rstan
pairs(stanFitNormal, pars=c('mu','sigma'))
stan_scat(stanFitNormal, c('mu[1]','mu[2]'))
stan_scat(stanFitNormal, c('mu[1]','sigma[1]'))
stan_scat(stanFitNormal, c('mu[1]','sigma[2]'))
stan_scat(stanFitNormal, c('mu[2]','sigma[1]'))
stan_scat(stanFitNormal, c('mu[2]','sigma[2]'))
stan_scat(stanFitNormal, c('sigma[1]','sigma[2]'))
stan_hist(stanFitNormal)
stan_dens(stanFitNormal)
# autocorrelation:
stan_ac(stanFitNormal, separate_chains = T)
stan_diag(stanFitNormal,information = "sample",chain=0)
stan_diag(stanFitNormal,information = "stepsize",chain = 0)
stan_diag(stanFitNormal,information = "treedepth",chain = 0)
stan_diag(stanFitNormal,information = "divergence",chain = 0)
If we prefer output using coda class, reformat the chains into coda:
library(coda)
<- function(stanFitNormal) {
stan2coda # apply to all chains
mcmc.list(lapply(1:ncol(stanFitNormal), function(x) mcmc(as.array(stanFitNormal)[,x,])))
}<- stan2coda(stanFitNormal)
codaSamples summary(codaSamples)
Iterations = 1:1250
Thinning interval = 1
Number of chains = 4
Sample size per chain = 1250
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
mu[1] 100.04 2.431 0.03438 0.03166
mu[2] 107.79 3.357 0.04748 0.04742
sigma[1] 18.34 1.806 0.02554 0.02516
sigma[2] 26.02 2.441 0.03452 0.03252
lp__ -423.27 1.465 0.02072 0.02860
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
mu[1] 95.33 98.39 100.01 101.71 104.76
mu[2] 101.37 105.50 107.78 110.00 114.44
sigma[1] 15.28 17.07 18.22 19.44 22.38
sigma[2] 21.77 24.31 25.80 27.58 31.23
lp__ -426.95 -424.00 -422.93 -422.20 -421.44
plot(codaSamples)
autocorr.plot(codaSamples)
effectiveSize(codaSamples)
mu[1] mu[2] sigma[1] sigma[2] lp__
6113.262 5058.326 5239.336 5666.196 2646.550
gelman.diag(codaSamples)
Potential scale reduction factors:
Point est. Upper C.I.
mu[1] 1 1
mu[2] 1 1
sigma[1] 1 1
sigma[2] 1 1
lp__ 1 1
Multivariate psrf
1
gelman.plot(codaSamples)
plot(density(codaSamples[[1]][,1]),xlim=c(10,120),ylim=c(0,.25),main="Posterior Densities") # mu[1], 1st chain
lines(density(codaSamples[[1]][,2])) # mu[2], 1st chain
lines(density(codaSamples[[1]][,3])) # sigma[1], 1st chain
lines(density(codaSamples[[1]][,4])) # sigma[2], 1st chain
lines(density(codaSamples[[2]][,1]),col="red") # mu[1], 2nd chain
lines(density(codaSamples[[2]][,2]),col="red") # mu[2], 2nd chain
lines(density(codaSamples[[2]][,3]),col="red") # sigma[1], 2nd chain
lines(density(codaSamples[[2]][,4]),col="red") # sigma[2], 2nd chain
Or you can use shinystan to analyze fitted model
library(shinystan)
Launch shiny application with the loaded object.
launch_shinystan(stanFitNormal)
Use the robust assumption of Student-t distribution instead of normal distribution.
= "
modelString data {
int<lower=1> Ntotal;
int x[Ntotal];
real y[Ntotal];
real meanY;
real sdY;
}
transformed data {
real unifLo;
real unifHi;
real normalSigma;
real expLambda ;
unifLo = sdY/1000;
unifHi = sdY*1000;
normalSigma = sdY*100;
expLambda = 1/29.0;
}
parameters {
real<lower=0> nuMinusOne;
real mu[2] ; // 2 groups
real<lower=0> sigma[2] ; // 2 groups
}
transformed parameters {
real<lower=0> nu ;
nu = nuMinusOne + 1 ;
}
model {
sigma ~ uniform( unifLo , unifHi ) ; // vectorized 2 groups
mu ~ normal( meanY , normalSigma ) ; // vectorized 2 groups
nuMinusOne ~ exponential( expLambda ) ;
for ( i in 1:Ntotal ) {
y[i] ~ student_t(nu, mu[x[i]] ,sigma[x[i]]) ; // nested index of group
}
}
"
If running the description in modelString for the first time create stanDSO, otherwise reuse it.
<- stan_model(model_code=modelString) stanDsoRobust
If saved DSO is used load it, then run the chains.
# saveRDS(stanDsoRobust,file="data/DSORobust1.Rds")
<-readRDS(file="data/DSORobust1.Rds") stanDsoRobust
If necessary, initialize chains. Parameter init
can be:
one of digit 0, string “0” or “random”, a function that returns a list,
or a list of initial parameter values with which to indicate how the
initial values of parameters are specified.
Since Stan has pretty complicated parameter tuning process during which among other parameters it selects initial values, it may be a good idea to let Stan select default initial parameters until we get enough experience.
Run MCMC.
= c( "mu" , "sigma" , "nu" ) # The parameters to be monitored
parameters = 500 # Number of steps to "tune" the samplers
adaptSteps = 1000
burnInSteps = 4
nChains = 1
thinSteps <-5000
numSavedSteps# Get MC sample of posterior:
<- sampling( object=stanDsoRobust ,
stanFitRobust data = dataList ,
pars = parameters , # optional
chains = nChains ,
cores=nChains,
iter = (ceiling(numSavedSteps/nChains)*thinSteps
+ burnInSteps ) ,
warmup = burnInSteps ,
init = "random" , # optional
thin = thinSteps )
Save the fit object.
# save(stanFitRobust,file="data/StanRobustFit2Groups.Rdata")
load("data/StanRobustFit2Groups.Rdata")
Explore the results.
print(stanFitRobust)
Inference for Stan model: 3b61ff01c98dc80a0961c8bae33b0b6e.
4 chains, each with iter=2250; warmup=1000; thin=1;
post-warmup draws per chain=1250, total post-warmup draws=5000.
mean se_mean sd 2.5% 25% 50% 75% 97.5%
mu[1] 99.28 0.03 1.76 95.89 98.11 99.32 100.47 102.63
mu[2] 107.14 0.04 2.69 101.78 105.36 107.10 108.95 112.28
sigma[1] 11.32 0.03 1.75 8.28 10.11 11.21 12.40 15.06
sigma[2] 17.87 0.04 2.71 12.99 16.01 17.72 19.57 23.55
nu 3.87 0.03 1.71 1.91 2.82 3.51 4.42 8.15
lp__ -451.38 0.03 1.67 -455.61 -452.22 -451.03 -450.15 -449.20
n_eff Rhat
mu[1] 4781 1
mu[2] 5127 1
sigma[1] 4350 1
sigma[2] 4679 1
nu 3694 1
lp__ 2314 1
Samples were drawn using NUTS(diag_e) at Mon Jan 30 01:00:14 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
plot(stanFitRobust)
::traceplot(stanFitRobust, ncol=1, inc_warmup=F) rstan
pairs(stanFitRobust, pars=c('nu','mu','sigma'))
stan_scat(stanFitRobust, c('nu','mu[1]'))
stan_scat(stanFitRobust, c('nu','mu[2]'))
stan_scat(stanFitRobust, c('nu','sigma[1]'))
stan_scat(stanFitRobust, c('nu','sigma[2]'))
stan_scat(stanFitRobust, c('mu[1]','sigma[1]'))
stan_scat(stanFitRobust, c('sigma[1]','sigma[2]'))
stan_hist(stanFitRobust)
stan_dens(stanFitRobust)
stan_ac(stanFitRobust, separate_chains = T)
stan_diag(stanFitRobust,information = "sample",chain=0)
stan_diag(stanFitRobust,information = "stepsize",chain = 0)
stan_diag(stanFitRobust,information = "treedepth",chain = 0)
stan_diag(stanFitRobust,information = "divergence",chain = 0)
launch_shinystan(stanFitRobust)
For comparison decide whether the two groups are different or not, using the Frequentist approach.
Use t-test with unequal variances:
qqnorm(y[x==1])
qqline(y[x==1])
qqnorm(y[x==2])
qqline(y[x==2])
t.test(Score ~ Group , data=dta, var.equal=FALSE, paired=FALSE)
Welch Two Sample t-test
data: Score by Group
t = -1.958, df = 111.44, p-value = 0.05273
alternative hypothesis: true difference in means between group Placebo and group Smart Drug is not equal to 0
95 percent confidence interval:
-15.70602585 0.09366161
sample estimates:
mean in group Placebo mean in group Smart Drug
100.0351 107.8413
Result: with 5% error I level the null (equality)
hypothesis cannot be rejected.
However, the case is not so clear: the p-value is at the marginal
(very close to 5%). The sample size is pretty small (120
observations in both groups), the result is not very
conclusive.
The test relies on the assumption that distributions of both samples are
Gaussian. As we can see in qq-plots this assumption does not hold.
Now use Bayesian approach based on robust estimation.
Create matrices of combined chains for the two means and two standard deviations of the robust fit.
summary(stanFitRobust)
$summary
mean se_mean sd 2.5% 25%
mu[1] 99.278479 0.02541479 1.757258 95.892836 98.106053
mu[2] 107.135439 0.03759197 2.691692 101.782362 105.356701
sigma[1] 11.324726 0.02646412 1.745481 8.283891 10.114247
sigma[2] 17.871701 0.03961980 2.710205 12.992997 16.008138
nu 3.865007 0.02819681 1.713714 1.906517 2.816141
lp__ -451.375158 0.03470814 1.669462 -455.612524 -452.215782
50% 75% 97.5% n_eff Rhat
mu[1] 99.318108 100.465290 102.634056 4780.774 1.0001983
mu[2] 107.104442 108.951046 112.281358 5126.967 1.0002329
sigma[1] 11.214309 12.404448 15.055811 4350.261 0.9992927
sigma[2] 17.718447 19.571494 23.554055 4679.288 0.9999055
nu 3.506651 4.418275 8.154027 3693.830 1.0003792
lp__ -451.034452 -450.145225 -449.198416 2313.613 1.0004085
$c_summary
, , chains = chain:1
stats
parameter mean sd 2.5% 25% 50%
mu[1] 99.31321 1.794266 95.953253 98.116266 99.321257
mu[2] 106.96172 2.739928 101.452693 105.249842 106.924883
sigma[1] 11.33591 1.739136 8.295615 10.138611 11.217442
sigma[2] 17.87345 2.746624 13.081372 15.908501 17.549351
nu 3.89781 1.758589 1.897855 2.791728 3.522191
lp__ -451.39555 1.654445 -455.559983 -452.246843 -451.069865
stats
parameter 75% 97.5%
mu[1] 100.487503 102.712333
mu[2] 108.803910 112.141803
sigma[1] 12.430281 15.144620
sigma[2] 19.648503 23.691793
nu 4.469162 8.370698
lp__ -450.199223 -449.224826
, , chains = chain:2
stats
parameter mean sd 2.5% 25% 50%
mu[1] 99.359740 1.727473 96.002888 98.183860 99.424128
mu[2] 107.205724 2.710996 101.960121 105.333992 107.264773
sigma[1] 11.328668 1.715013 8.365028 10.140853 11.193846
sigma[2] 17.821753 2.637216 13.271496 15.984582 17.685629
nu 3.789725 1.446564 1.982413 2.796458 3.463222
lp__ -451.324191 1.638725 -455.461092 -452.162620 -450.963830
stats
parameter 75% 97.5%
mu[1] 100.586306 102.610716
mu[2] 109.024107 112.217721
sigma[1] 12.358057 15.070945
sigma[2] 19.416344 23.651606
nu 4.371135 7.443861
lp__ -450.105134 -449.218868
, , chains = chain:3
stats
parameter mean sd 2.5% 25% 50%
mu[1] 99.241089 1.715029 95.621161 98.141599 99.310472
mu[2] 107.198150 2.761142 101.738884 105.413202 107.184480
sigma[1] 11.324729 1.786161 8.129510 10.087825 11.237923
sigma[2] 17.834568 2.789681 12.584832 16.032342 17.729390
nu 3.847617 1.677111 1.854929 2.795824 3.477059
lp__ -451.449925 1.738399 -455.821321 -452.303253 -451.091065
stats
parameter 75% 97.5%
mu[1] 100.432571 102.53221
mu[2] 109.058140 112.43690
sigma[1] 12.455750 15.07465
sigma[2] 19.478207 23.53294
nu 4.352777 8.40313
lp__ -450.148846 -449.15265
, , chains = chain:4
stats
parameter mean sd 2.5% 25% 50%
mu[1] 99.199874 1.788565 95.891846 97.960787 99.213313
mu[2] 107.176156 2.544973 102.262461 105.526175 107.097533
sigma[1] 11.309597 1.742850 8.295669 10.087199 11.227140
sigma[2] 17.957035 2.665734 13.114103 16.071792 17.869635
nu 3.924874 1.935407 1.912542 2.882155 3.564496
lp__ -451.330962 1.643141 -455.604148 -452.095591 -450.986033
stats
parameter 75% 97.5%
mu[1] 100.36449 102.632627
mu[2] 108.81906 112.363129
sigma[1] 12.38710 14.913067
sigma[2] 19.68608 23.495363
nu 4.44298 8.213977
lp__ -450.10343 -449.228235
<-cbind(Mu=rstan::extract(stanFitRobust,pars="mu[1]")$'mu[1]',
y.dis1Sigma=rstan::extract(stanFitRobust,pars="sigma[1]")$'sigma[1]')
<-cbind(Mu=rstan::extract(stanFitRobust,pars="mu[2]")$'mu[2]',
y.dis2Sigma=rstan::extract(stanFitRobust,pars="sigma[2]")$'sigma[2]')
<-density(y.dis1[, "Mu"])
den.Dis1<-density(y.dis2[, "Mu"])
den.Dis2plot(den.Dis1,col="blue", xlim=c(90,120), main="Compare 2 distributions")
lines(den.Dis2,col="red")
Traditional Bayesian approach: look at HDIs.
library(HDInterval)
hdi(cbind(y.dis1[,1], y.dis2[,1]), credMass=.9)
[,1] [,2]
lower 96.40157 102.6255
upper 102.14760 111.4564
attr(,"credMass")
[1] 0.9
hdi(cbind(y.dis1[,2], y.dis2[,2]), credMass=.85)
[,1] [,2]
lower 8.79751 13.88659
upper 13.71127 21.54786
attr(,"credMass")
[1] 0.85
The 95% HDI intervals overlap for both parameters, but with reduced credible mass level they can be distinguished.
Mean difference
<- y.dis2[,1] - y.dis1[,1]
paramSampleVec plotPost(paramSampleVec=paramSampleVec)
Apply Frequentist approach to the chain samples.
First, check if the samples for two standard deviations are significantly different or not:
c(mean(y.dis1[,2]),mean(y.dis2[,2])) # mean values
[1] 11.32473 17.87170
c(sd(y.dis1[,2]),sd(y.dis2[,2])) # standard deviations of samples of MCMC standard deviations
[1] 1.745481 2.710205
ks.test(y.dis1[,2],y.dis2[,2]) # Kolmogorov-Smirnov test for posterior distributions of standard deviations
Two-sample Kolmogorov-Smirnov test
data: y.dis1[, 2] and y.dis2[, 2]
D = 0.862, p-value < 2.2e-16
alternative hypothesis: two-sided
<-density(y.dis2[,2])
denplot(density(y.dis1[,2]),xlim=c(5,30))
lines(den$x,den$y,col="red")
t.test(y.dis1[,2],y.dis2[,2], var.equal=F, paired=FALSE) #t-test for means of posterior distributions for standard deviations
Welch Two Sample t-test
data: y.dis1[, 2] and y.dis2[, 2]
t = -143.61, df = 8537.3, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.636341 -6.457609
sample estimates:
mean of x mean of y
11.32473 17.87170
Check shapes of the distributions of the mean and standard deviation parameters. How different are they between control and treated groups?
For standard deviations:
qqnorm(y.dis1[,2]) # control
qqline(y.dis1[,2])
qqnorm(y.dis2[,2]) # treatment
qqline(y.dis2[,2])
For mean values:
qqnorm(y.dis1[,1]) #control
qqline(y.dis1[,1])
qqnorm(y.dis2[,1]) # treatment
qqline(y.dis2[,1])
Comparison of mean and standard deviations of the posterior sample for standard deviations, Kolmogorov-Smirnov test, density plots and t-test for the two samples all indicate that the variances of the two groups are different.
This means that we cannot apply ANOVA to compare the two group mean values directly.
Try t-test for the two means of the posterior distributions with different variances:
t.test(y.dis1[,1],y.dis2[,1], var.equal=F, paired=FALSE)
Welch Two Sample t-test
data: y.dis1[, 1] and y.dis2[, 1]
t = -172.83, df = 8605.2, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.946073 -7.767847
sample estimates:
mean of x mean of y
99.27848 107.13544
The null hypothesis of equality of the means of the posterior distributions for the mean values of the two groups decisively rejected: we are testing with a lot longer samples.
Plot the images of the two groups in the mean-standard deviation parameter space:
plot(y.dis1,xlim=c(92,118),ylim=c(5,33),col="red",xlab="Mean",ylab="St. Dev.")
points(y.dis2,col="blue")
We can see that by using at least 2 different methods proving that there is a significant difference between the 2 groups shown on the plot.
John K. Kruschke, Journal of Experimental Psychology: General, 2013, v.142(2), pp.573-603. (doi: 10.1037/a0029146), website
Kruschke, John K. 2015. Doing Bayesian Data Analysis : A Tutorial with r, JAGS, and Stan. Book. 2E [edition]. Amsterdam: Academic Press is an imprint of Elsevier.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hai-mn/hai-mn.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nguyen (2022, Jan. 15). HaiBiostat: Series 3 of 10 -- How to Compare Two Groups with Robust Bayesian Estimation. Retrieved from https://hai-mn.github.io/posts/2022-01-15-Bayesian methods - Series 3 of 10/
BibTeX citation
@misc{nguyen2022series, author = {Nguyen, Hai}, title = {HaiBiostat: Series 3 of 10 -- How to Compare Two Groups with Robust Bayesian Estimation}, url = {https://hai-mn.github.io/posts/2022-01-15-Bayesian methods - Series 3 of 10/}, year = {2022} }